Estimating statistical significance of sequence alignments.
نویسنده
چکیده
Algorithms that compare two proteins or DNA sequences and produce an alignment of the best matching segments are widely used in molecular biology. These algorithms produce scores that when comparing random sequences of length n grow proportional to n or to log(n) depending on the algorithm parameters. The Azuma-Hoeffding inequality gives an upper bound on the probability of large deviations of the score from its mean in the linear case. Poisson approximation can be applied in the logarithmic case.
منابع مشابه
Estimating Pairwise Statistical Significance of Protein Local Alignments Using a Clustering-Classification Approach Based on Amino Acid Composition
A central question in pairwise sequence comparison is assessing the statistical significance of the alignment. The alignment score distribution is known to follow an extreme value distribution with analytically calculable parameters K and λ for ungapped alignments with one substitution matrix. But no statistical theory is currently available for the gapped case and for alignments using multiple...
متن کاملIdentifying DNA and protein patterns with statistically significant alignments of multiple sequences
MOTIVATION Molecular biologists frequently can obtain interesting insight by aligning a set of related DNA, RNA or protein sequences. Such alignments can be used to determine either evolutionary or functional relationships. Our interest is in identifying functional relationships. Unless the sequences are very similar, it is necessary to have a specific strategy for measuring-or scoring-the rela...
متن کاملStatistical significance in biological sequence analysis
One of the major goals of computational sequence analysis is to find sequence similarities, which could serve as evidence of structural and functional conservation, as well as of evolutionary relations among the sequences. Since the degree of similarity is usually assessed by the sequence alignment score, it is necessary to know if a score is high enough to indicate a biologically interesting a...
متن کاملScale-invariant structure of strongly conserved sequence in genomic intersections and alignments.
A power-law distribution of the length of perfectly conserved sequence from mouse/human whole-genome intersection and alignment is exhibited. Spatial correlations of these elements within the mouse genome are studied. It is argued that these power-law distributions and correlations are comprised in part by functional noncoding sequence and ought to be accounted for in estimating the statistical...
متن کاملCOMPASS: a tool for comparison of multiple protein alignments with assessment of statistical significance.
We present a novel method for the comparison of multiple protein alignments with assessment of statistical significance (COMPASS). The method derives numerical profiles from alignments, constructs optimal local profile-profile alignments and analytically estimates E-values for the detected similarities. The scoring system and E-value calculation are based on a generalization of the PSI-BLAST ap...
متن کاملErasing errors due to alignment ambiguity when estimating positive selection.
Current estimates of diversifying positive selection rely on first having an accurate multiple sequence alignment. Simulation studies have shown that under biologically plausible conditions, relying on a single estimate of the alignment from commonly used alignment software can lead to unacceptably high false-positive rates in detecting diversifying positive selection. We present a novel statis...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Philosophical transactions of the Royal Society of London. Series B, Biological sciences
دوره 344 1310 شماره
صفحات -
تاریخ انتشار 1994